Search CORE

159 research outputs found

Self-coordination of parameter conflicts in D-SON architectures: a Markov decision process framework

Author: Jessica Moysen
L Panait
Lorenza Giupponi
S Weidenholzer
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

A visual demonstration of convergence properties of cooperative coevolution

Author: A. Liekens
J. Hofbauer
J. Juliany
J. Maynard-Smith
K. Alligood
L. Panait
L. Panait
M. Chang
M. Potter
M.A. Potter
R. Eriksson
R.P. Wiegand
R.P. Wiegand
S. Ficici
W.M. Spears
Publication venue: Springer
Publication date: 01/01/2004
Field of study

We introduce a model for cooperative coevolutionary algorithms (CCEAs) using partial mixing, which allows us to compute the expected long-run convergence of such algorithms when individuals ’ fitness is based on the maximum payoff of some N evaluations with partners chosen at random from the other population. Using this model, we devise novel visualization mechanisms to attempt to qualitatively explain a difficult-to-conceptualize pathology in CCEAs: the tendency for them to converge to suboptimal Nash equilibria. We further demonstrate visually how increasing the size of N, or biasing the fitness to include an ideal-collaboration factor, both improve the likelihood of optimal convergence, and under which initial population configurations they are not much help

CiteSeerX

Crossref

Rational bidding using reinforcement learning: an application in automated resource allocation

Author: A. Sherstov
C. Watkins
D. Gode
D. Reeves
E. Medernach
H.J. Herik van den
I. Erev
K. Lai
L. Panait
M. He
M. Kearns
M. Wellman
P. Green
R. Luce
S. Kaplan
T. Saaty
W. Smith
Y. Shoham
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The application of autonomous agents by the provisioning and usage of computational resources is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic resource provisioning and usage of computational resources, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems. The contributions of the paper are threefold. First, we present a framework for supporting consumers and providers in technical and economic preference elicitation and the generation of bids. Secondly, we introduce a consumer-side reinforcement learning bidding strategy which enables rational behavior by the generation and selection of bids. Thirdly, we evaluate and compare this bidding strategy against a truth-telling bidding strategy for two kinds of market mechanisms – one centralized and one decentralized

Crossref

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Q-Strategy: A Bidding Strategy for Market-Based Allocation of Grid Services

Author: A. Sherstov
C. Watkins
D. Cliff
D. Gode
D. Minoli
E. Medernach
H.J. Herik van den
I. Erev
K. Lai
L. Panait
M. He
M. Wellman
P. Green
R. Luce
R. Wolski
S. Gjerstad
S. Kaplan
T. Saaty
W. Smith
Y. Shoham
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

The application of autonomous agents by the provisioning and usage of computational services is an attractive research field. Various methods and technologies in the area of artificial intelligence, statistics and economics are playing together to achieve i) autonomic service provisioning and usage of Grid services, to invent ii) competitive bidding strategies for widely used market mechanisms and to iii) incentivize consumers and providers to use such market-based systems. The contributions of the paper are threefold. First, we present a bidding agent framework for implementing artificial bidding agents, supporting consumers and providers in technical and economic preference elicitation as well as automated bid generation by the requesting and provisioning of Grid services. Secondly, we introduce a novel consumer-side bidding strategy, which enables a goal-oriented and strategic behavior by the generation and submission of consumer service requests and selection of provider offers. Thirdly, we evaluate and compare the Q-strategy, implemented within the presented framework, against the Truth-Telling bidding strategy in three mechanisms – a centralized CDA, a decentralized on-line machine scheduling and a FIFO-scheduling mechanisms

Crossref

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Continuous Strategy Replicator Dynamics for Multi--Agent Learning

Author: Aram Galstyan
J. Hofbauer
J. Hu
J. Oechssler
K. Tuyls
K. Tuyls
L. Busoniu
L. M. Wahl
L. M. Wahl
L. P. Kaelbling
L. Panait
M. Bowling
P. Stone
R. Cressman
R. S. Sutton
S. Abdallah
S. Le
T. Borgers
T. Killingback
Y. Sato
Y. Sato
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/09/2011
Field of study

The problem of multi-agent learning and adaptation has attracted a great deal of attention in recent years. It has been suggested that the dynamics of multi agent learning can be studied using replicator equations from population biology. Most existing studies so far have been limited to discrete strategy spaces with a small number of available actions. In many cases, however, the choices available to agents are better characterized by continuous spectra. This paper suggests a generalization of the replicator framework that allows to study the adaptive dynamics of Q-learning agents with continuous strategy spaces. Instead of probability vectors, agents strategies are now characterized by probability measures over continuous variables. As a result, the ordinary differential equations for the discrete case are replaced by a system of coupled integral--differential replicator equations that describe the mutual evolution of individual agent strategies. We derive a set of functional equations describing the steady state of the replicator dynamics, examine their solutions for several two-player games, and confirm our analytical results using simulations.Comment: 12 pages, 15 figures, accepted for publication in JAAMA

arXiv.org e-Print Archive

Crossref

Learning with Whom to Communicate Using Relational Reinforcement Learning

Author: A. Finzi
A. Guerra-Hernández
I.A. Letia
J. Hu
K. Driessens
K. Tuyls
L. Panait
L. Panait
M. Otterlo van
M. Puterman
P. Stone
R. Bellman
R. Sutton
R. Sutton
S. Džeroski
S. Muggleton
T. Croonenborghs
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Maastricht University Research Portal

Crossref

Dynamic Partition of Collaborative Multiagent Based on Coordination Trees

Author: D.V. Pynadath
F.C.A. Groen
J.R. Kok
L. Panait
L.E. Parker
M. Boeling
M. Christopher Gifford
P.J.’. Hoen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

In team Markov games research, it is difficult for an individual agent to calculate the reward of collaborative agents dynamically. We present a coordination tree structure whose nodes are agent subsets or an agent. Two kinds of weights of a tree are defined which describe the cost of an agent collaborating with an agent subset. We can calculate a collaborative agent subset and its minimal cost for collaboration using these coordination trees. Some experiments of a Markov game have been done by using this novel algorithm. The results of the experiments prove that this method outperforms related multi-agent reinforcement-learning methods based on alterable collaborative teams

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics

Author: A. Cavagna
A. Corl
A. De Martino
A. Tversky
A.M. Dufty Jr
A.V. Badyaev
B. Doligez
B. Kerr
B. Sinervo
B. Sinervo
B. Sinervo
B. Sinervo
B. Sinervo
C. Bleay
C. Castellano
C. Hauert
C. Kirkup
C. Mettke-Hofmann
C.E. Paquin
D. Challet
D. Helbing
E.C. Engel
E.J. Collins
E.M. Erhart
F. Wang
F. Widemo
I. Lubashevsky
J.M. Rowland
J.M. Smith
J.P. Garrahan
J.R. Kok
K. Deithelm
K. Yamasaki
L. Buşoniu
L. Galeone
L. Kirwan
L. Kirwan
L. Lehmann
L. Panait
L.D. LaDage
L.R. Squire
L.T. Lancaster
L.W. Buss
M. Koganezawa
M. Marsili
M.J. West-Eberhard
O. Ronce
P.J. Greenwood
R. Hau
R. Hertwig
R. Trivers
R.A. Johnson
R.P. Balda
S. Gibeault
S. Kanemoto
S.A. West
S.M. Gray
S.M. Shuster
S.M. Shuster
S.R. Pryke
T. Borgers
T. Rhodes
T. Uller
T.A. Perkins
T.W. Fawcett
V. Gafiychuk
V.A.A. Jansen
W.-T. Fu
Y. Sato
Y. Sato
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/04/2010
Field of study

A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a specific action at each moment of time. The contribution of the rewards in the past to the agent current perception of action value is described by an integral operator with a power-law kernel. Finally a fractional differential equation governing the system dynamics is obtained. The agents are considered to interact with one another implicitly via the reward of one agent depending on the choice of the other agents. The pairwise interaction model is adopted to describe this effect. As a specific example of systems with non-transitive interactions, a two agent and three agent systems of the rock-paper-scissors type are analyzed in detail, including the stability analysis and numerical simulation. Scale-free memory is demonstrated to cause complex dynamics of the systems at hand. In particular, it is shown that there can be simultaneously two modes of the system instability undergoing subcritical and supercritical bifurcation, with the latter one exhibiting anomalous oscillations with the amplitude and period growing with time. Besides, the instability onset via this supercritical mode may be regarded as "altruism self-organization". For the three agent system the instability dynamics is found to be rather irregular and can be composed of alternate fragments of oscillations different in their properties.Comment: 17 pages, 7 figur

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Research Papers in Economics

A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

Author: A Aydemir
A Elfes
A Giusti
A Tampuu
B Kuipers
C Cadena
C Tomasi
CL Giles
FS Melo
J Canny
JK Gupta
K Konolige
L Busoniu
L Panait
LE Kavraki
M Jaderberg
MG Bellemare
RC Smith
RS Sutton
S Daftry
S Hochreiter
V Mnih
Publication venue
Publication date: 09/07/2020
Field of study

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page: https://unnat.github.io/cordial-syn

arXiv.org e-Print Archive

Crossref

Building collaboration in multi-agent systems using reinforcement learning

Author: A Colorni
A Kazemi
A Kouider
C Watkins
G Tesauro
H Iima
J Bradtke
J Kennedy
J Vazquez-Salceda
JN Tsitsiklis
L Bull
L Bull
L Panait
LM Hercog
M Gath
M Kolp
M Tasgetiren
MB Ayhan
ME Aydin
ME Aydin
ME Aydin
R Poli
RS Sutton
S Mohebbi
U Wilensky
X Dong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

© Springer Nature Switzerland AG 2018. This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory, either. Particles are devised with Q learning for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced experimental results are supportive to the proposed idea suggesting that a substantive collaboration can be build via proposed learning algorithm

Crossref

UWE Bristol Research Repository